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Abstract 

In this paper we define and examine the power of the conditional- sampling oracle in 
the context of distribution-property testing. The conditional-sampling oracle for a discrete 
distribution [i takes as input a subset S C [n] of the domain, and outputs a random sample 
i G S drawn according to fx, conditioned on S (and independently of all prior samples). The 
conditional-sampling oracle is a natural generalization of the ordinary sampling oracle in 
which S always equals [n]. 

We show that with the conditional-sampling oracle, testing uniformity, testing identity 
to a known distribution, and testing any lab el- invariant property of distributions is easier 
than with the ordinary sampling oracle. On the other hand, we also show that for some 
distribution properties the sample-complexity remains near-maximal even with conditional 
sampling. 



*Research supported in part by an ERC-2007-StG grant number 202405. 



1 Introduction 



In the last decade several works have investigated the problem of testing various properties of 
huge data sets, that can be represented as an unknown distribution from which independent 
samples can be taken. In distribution-property testing, the goal is to distinguish the case where 
the samples come from a distribution that has a certain property V from the case where the 
samples come from a distribution that is far, in the variation distance, from any distribution that 
has the property V (the variation distance between two distributions /j, and // over a common set 
B is ^ YlieB I [*] — P'V [*] I ' wm ch is equal to the maximum difference in probability between 
the distributions for any possible event). In the traditional setting no access is provided to the 
distribution apart from the ability to take samples, and the two cases should be distinguished 
using as few of them as possible. 

There are several natural distribution properties that were studied in this context: testing 
whether a distribution is uniform [7], testing identity between distributions (taking samples from 
both) [HUH], testing whether a joint distribution is independent (a product of two distributions) 
[3] and more. Some useful general techniques have also been designed to obtain nearly tight 
lower bounds on various distribution-property testing problems [12] . Other tightly related works 
study the problems of estimating various measures of distributions, such as entropy [2j [8] or 
support size [TTJ. 

Most attention has been given to testing properties of distributions over very large (discrete) 
domains, where the need for sublinear time and sample complexities is vital. Distribution- 
property testers with a sublinear sample complexity are motivated by problems from various 
areas, such as physics, cryptography, statistics, computational learning theory, property testing 
of graphs and sequences and streaming algorithms (see the overview in |1U| for a comprehensive 
list of references). Indeed, in many of the aforementioned works testers have been designed with 
sublinear sample (and time) complexity, that is often of the form n a , where n is the size of the 
domain, and a is a positive constant smaller than 1. 

While most previous works are focused on the ordinary sampling oracle, other stronger 
oracles were considered too. A major reason is that the number of required samples, while 
sublinear, is still very large in the original model. The most notable example is the oracle from 
[2], that also allows querying the exact probability weight of any element from the domain. 
Another research direction involved restricting the problem further, for example by adding the 
promise of the distribution being monotone [5]. 

In this work we study the problem of testing several distribution properties in an unrestricted 
setting while providing for a stronger oracle, that can be thought of as more natural than 
the one of [2] in some situations. Namely, we allow the samples obtained from the unknown 
distribution to be conditioned over specified subsets of the domain. In our setting, we assume 
that a sampling oracle to the unknown distribution /j, over the discrete domain [n] = {1, . . . , n} 
is provided, that allows us to sample random (according to fi) elements conditioned on any 
specified subset S C [n]. If the original distribution is described by the probabilities pi, . . . ,p n 
(where the probability for obtaining i 6 [n] is pt), then when restricting to S the probability of 
sampling i £ [n] is Pi/(^2j^sPj) if ^ € 5 and otherwise (see the formal definition of the model 
and corresponding testers in Section [2]). 

In various scenarios, conditional samples can be obtained naturally, or come at a low cost 
relative to that of extracting any sample - see some illustrating examples in Section 11.11 This 
leads to the following natural question: can we reduce the sample complexity of distribution- 
property testers using conditional samples? 

Indeed, conditional sampling is more powerful than the traditional model: We show that with 
conditional samples several natural distribution properties, such as uniformity, can be tested 
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in constant time (compared to G(y / n) unconditional samples even for uniformity [7J [5]). The 
most general result of this paper (Section 2]) is that any label-invariant property of distributions 
(a symmetric property in the terminology of [12J) can be tested using poly(logra) conditional 
samples^] 

On the other hand, there are properties for which testing remains almost as hard as possible 
even with conditional samples: We show a property of distributions that requires at least Q(n) 
conditional samples to test (Section [7]). 

Another feature that makes conditional-samples interesting is that in contrast to the testers 
using ordinary samples, which are non-adaptive by definition, adaptivity (and the algorithmic 
aspect of testing) in conditional-sampling model plays an important role. For instance, the 
aforementioned task of testing uniformity, while still possible with a much better sampling 
complexity than in the traditional model, cannot be done non-adaptively with a constant number 
of samples (see Section I6.2p . 

Before we move to some motivating examples, let us address the concern whether arbitrary 
conditioning is realistic: While the examples below do relate to arbitrary conditioning, some- 
times one would like the conditioning to be more restricted, in some sense describable by fewer 
than the n bits required to describe the conditioning set S. In fact, many of our algorithms 
require less than that. For example, the adaptive uniformity test takes only unconditional sam- 
ples and samples conditioned on a constant size set, so the description size per sample is in fact 
O(logra), as there are possibilities. The adaptive general label invariant property tester 

takes only samples conditioned to dyadic intervals of [n], so here the description size is also 
0(log n) as well. The non-adaptive tests do require general conditioning, as they pick uniformly 
random sets of prescribed sizes. 

1.1 Some motivating examples 
Lottery machines 

The gravity pick lottery machine is the most common lottery machine used worldwide to pick 
random numbers. A set B of balls, each marked with a unique number i £ N, are dropped into 
the machine while it is spinning, and after certain amount of time the machine allows a single 
ball to drop out. Ensuring that such a machine is fair is an important real-life problem^ 

Suppose that, given a machine and set of balls, we wish to test them for being fair. Specifi- 
cally, we would like to distinguish between the following cases: 

• The machine picks the balls uniformly at random, that is, for any subset B' C B of balls 
dropped into the machine, and for each i £ B' , the probability that the machine picks i 
is 1/\B'\; 

• The distribution according to which the balls are picked is e-far from uniform (where e > 
is some fixed constant, and the distance we consider is the standard variation distance 
defined above). 

Suppose furthermore that we wish to distinguish between those cases as quickly as possible, 
and in particular, within few activations of the machine. Compare the following solutions. 

We can use the uniformity tester [7] for this task. Obtaining each sample from the underlying 
distribution (with pi's) requires one activation of the machine (with the entire set B), and we 
can complete the test using 0(-v/|B|) activations. 

1 We say that f(ai, . . . ,ai) = poly(gi(ai, . . . , ai), . . . , gk(cti, . . . , a;)) if there exists a polynomial p(xi, . . . , Xk) 
such that / < p(gi, . . . , gt) for all values of ai, . . . , ai in their respective domains. 

2 As was demonstrated in the the Pennsylvania Lottery scandal, see e.g. 
http : / /en . wikipedia. org/w/ index . php?title=1980_Pennsylvania_Lottery_scandal.!coldid=496671681 
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Alternatively, using the algorithm we present in Section I3TTI using conditional samples we can 
complete the test using O(l) activations only (the number of activations only has a polynomial 
dependency on e and is logarithmic in the confidence parameter). Assuming that the drawing 
probabilities depend only on the physical characteristics of every ball separately, a conditional 
sample here corresponds to activating the machine with a specific subset of the balls rather 
than the entire set B. 

This is for testing uniformity. Using the algorithm from Section 01 we could also test for 
any label- invariant property with poly (log \B\) activations, which would allow us for example 
to give an estimation of the actual distance of the distribution from being uniform. 

Asymmetric communication scenarios 

Suppose that two computers A and B are linked with an asymmetric communication link, in 
which transmitting information in one of the directions (say from A to B) is much easier than 
in the other direction (consider e.g. a spacecraft traveling in remote space, with limited energy, 
computational power and transmitting capability; actually numerous examples of asymmetric 
communications also exist here on earth). Now assume that B has access to some large data 
that can be modeled as collection of samples coming from an unknown distribution n, while A 
wants to learn or test some properties of /x. We could simulate the standard testing algorithms 
by sending a request to B whenever a random sample from /i is needed. Assuming that the 
most important measure of efficiency is how much information is sent by B, it would translate 
to the sample complexity of the simulated algorithm. 

However, if B can also produce conditional samples (for example if it has nearly unlimited 
cost-free access to samples from the distribution), then any property that is significantly easier 
to test with conditional samples can be tested with fewer resources here. 

Political polls 

We mention these here because the modern-day practice of polling actually uses conditional 
sampling. Rather than taking a random sample of all willing potential participants, the polling 
population is usually first divided to groups according to common traits, and then each such 
group is polled separately before the results are re-integrated into the final prediction. 

1.2 Informal description of results 

In all sample-complexity upper bounds listed below there is a hidden factor of log(<5 -1 ), where 
5 is the maximal failure probability of the tester. Also, all lower bounds are for a fixed (and 
not so small) e. The results are summarized in Table [TJ 



Upper bounds 


Adaptive 


Non-adaptive 


Uniformity 


poly(e~ i ) 


poly^ogn^" 1 ) 


Identity to known dist. 


PoMlog^n,^ 1 ) 


poly(logn,e _1 ) 


Any label- invariant prop. 


poly (log r^e" 1 ) 




Lower bounds 


Adaptive 


Non-adaptive 


Uniformity and identity 




f2(log log n) 


Any label- invariant prop. 


f2(yiog logn) 


(follows uniformity) 


General properties 


n(n) 


(follows adaptive) 



Table 1: Summary of results. 
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Adaptive testing 

The first result we prove is that uniformity, and more generally identity to any distribution that 
is very close to uniform in the norm, can be tested (adaptively) with poly(e _1 ) conditional 
samples (Theorem 13.1.11 and Theorem 13.1.21 respectively). This is done by capturing (for far 
distributions) both "light" and "heavy" elements in the same small set and then conditioning 
over it. Our next result is that identity to any known distribution can be tested adaptively 
with poly(log* n, e _1 ) conditional samples, where n is the size of the domain (Theorem 13.2. 1|) . 
This uses the uniformity result with the bucketing technique of [3] together with a recursive 
argument. 

Our most general result is that any label-invariant (i.e. invariant under permutation of 
the domain) property of distributions can be tested adaptively with poly (log n, e _1 ) conditional 
samples (Theorem 14.0. ip . In fact, we go further to prove the following stronger result: with 
poly(log n, e , log(5 -1 )) conditional samples taken from fj,, it is possible to compute a distri- 
bution fj,' that is e-close to fj, up to some permutation of the domain [n] (Theorem I4.0.2p . For 
showing this we construct an explicit persistent sampler that could be interesting in itself. Es- 
sentially we construct a way to simulate (unconditional) samples from a distribution /} that is 
close to and for which we can also provide exact probability queries like the oracle of [2]. 

Non-adaptive testing 

We prove that uniformity can be tested non-adaptively with poly(log n, e _1 ) conditional samples. 
Here too, the tester enjoys a certain degree of tolerance, in the sense that it is possible to test 
identity with any distribution that is close enough to uniform (see Theorems 15.1.11 and I5.1.2|) . 
This is by first proving (through bucketing) that a portion of the "total difference" of /i from 
being uniform is in relatively equal-probability members of [n], and then trying to capture just 
a few of them in a random set of an appropriate size. We also prove (from the uniformity test 
through standard bucketing arguments) that identity to any known distribution can be tested 
non-adaptively with poly (log n, e _1 ) conditional samples (Theorem l5.2.ip . 

Lower bounds 

As already mentioned in the introduction, adaptivity is useful when we have access to conditional 
sampling. We demonstrate this by proving that testing uniformity non-adaptively requires 
SI (log log ra) conditional samples, for some fixed e > (Theorem 16.2. ip . We also prove that 
the tester for any label-invariant property (from our main result) cannot be improved to work 
with a constant number of conditional samples: There is a label invariant property which 
requires ^(ydog log n) samples to test, whether adaptively or not (Theorem I6.3.ip . Our third 
lower bound shows that for some properties conditional samples do not help much: There 
are distribution properties that cannot be tested (adaptively) with o(n) conditional samples 
(Theorem I7.0.ip . The first two lower bounds are through a special adaptation of Yao's method, 
while the last one is through a reduction to general properties of Boolean strings, of which 
maximally untestable examples are known. 

About the gaps in the bounds 

We believe that for non-adaptive uniformity testing the upper bound is closer in the truth, in 
that the actual complexity should be close to logarithmic in n. A more careful analysis of the 
lower bound construction would be a good starting point towards narrowing the gap. We also 
believe that the correct lower bound for adaptive testing of general label-invariant properties is 
higher than our achieved one. Additionally we believe that an examination of the methods of 
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|12| should allow us to construct label-invariant properties for which testing in the traditional 
(unconditioned) sampling model is nearly useless. 

2 Preliminaries 

2.1 The conditional distribution testing model 

Let pbe a distribution over {1, . . . , n}, its probabilities denoted by p%, . . . ,p n , where pi = Pr^fi]. 
We will also write p(i) for Pr^[z] where we deal with more then one distribution. The distribution 
p is not known to the algorithm explicitly, and may only be accessed by drawing samples. A 
conditional distribution testing algorithm may submit any set A C {l,...,n} and receive a 
sample i £ A that is drawn according to p conditioned on A (and independent of any previous 
samples). 

Thus when a sample is drawn according to p conditioned on A, the probability of getting j 
is Pr[j|vl] = pj /(^2 ie APi) for j £ A and for j A. If Yli&APi = then we assume (somewhat 
arbitrarily) that the algorithm obtains a uniformly drawn member of A. 

We measure farness using the variation distance: We say that p is e-far from a property V 
of distributions over {1, . . . , n}, if for every p' that satisfies V and is described by p[, . . . ,p' n we 
have d{p, p') = \ YTi=\ \Pi ~Pi\>e- 

We will consider two types of conditional distribution testing algorithms. Non-adaptive 
testers, which must decide the sets to sample from before getting any samples, and adaptive 
testers, which have no such restriction. 

Definition 2.1.1 (Non-adaptive tester). A non-adaptive distribution tester for a property V 
with conditional sample complexity t:RxRxN->Nisa randomized algorithm that receives 
e, 5 > 0, n 6 N and a conditional sampling oracle to a distribution p over [n] and operates as 
follows. 

1. The algorithm generates a sequence of t < t(e,6,n) sets A\,. . . ,A t C [n] (possibly with 
repetitions). 

2. Then it calls the conditional oracle t times with A\,...,At respectively, and receives 
jl,...,jt, where every j$ is drawn according to the distribution p conditioned on Ai, 
independently of ji, . . . and any other history. 

3. Based on the received elements ji, . . . ,jt and its internal coin tosses, the algorithm accepts 
or rejects the distribution p. 

If p satisfies V then the algorithm must accept with probability at least 1 — 6, and if p is e-far 
from V then the algorithm must reject with probability at least 1 — 5. 

Definition 2.1.2 (Adaptive tester). An adaptive distribution tester for a property V with 
conditional sample complexity t :lx!xN->Nisa randomized algorithm that receives 
e, 5 > 0, n G N and a conditional sampling oracle to a distribution p over [n] and operates as 
follows. 

1. For % £ at the ith phase the algorithm generates a set Ai C [n] (based on 
ji, . . . ,ji-i and its internal coin tosses), and calls the conditional oracle with Ai to receive 
an element ji, drawn according to the distribution p conditioned on Ai, independently of 
ji, ■ ■ ■ and any other history. 

2. Based on the received elements ji, ■ ■ ■ ,jt and its internal coin tosses, the algorithm accepts 
or rejects the distribution p. 
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If ji satisfies V the algorithm must accept with probability at least 1 — 5, and if /i is e-far from 
V the algorithm must reject with probability at least 1 — 5. 

As is standard in the field of property testing, the primary measure of efficiency of these 
testers is their sample complexity. 

2.2 Tools from previous works 

Our algorithms will make use of the Identity Tester of Batu et. al. [3 J (though it is important 
to note that this result is used mainly as a "primitive" and can be replaced in the sequel with 
making enough samples to fully approximate the distribution). 

Theorem 2.2.1 (Identity Tester). There is an algorithm T for testing identity between an 
unknown distribution // and a known distribution /i, both over [n], with (ordinary) sample 
complexity 0(y / npoly(e~ 1 ) log(5 -1 )). Namely, T accepts with probability 1 — 5 if // = and 
rejects with probability 1 — 5 if n' is e-far from ji. 

We will also use the following inequality, which appears as Theorem A. 1.11 and Theorem 
A. 1.13 in 0: 

Lemma 2.2.2. Let pi, . . . ,p n G [0, 1], X\, . . . ,X n be fully independent random variables with 
PrpQ = 1- Pi ]=p i and PrpQ = -p t ] = l-p i) and let p = ± Y!i=\Pi and x = Yh=i x %- T hen 
Pr[|A| > a] < 2exp(-a 2 /2pn). 

When using this lemma we interpret X +pn = Y17=i( x i ^~Pi) as the number of successes in 
n independent trials where the probability of success in the ith trial is pi. 

Bucketing 

Bucketing is a general tool, introduced in [HE], that decomposes any explicitly given distribution 
into a collection of distributions that are almost uniform. In this section we recall the bucketing 
technique and the lemmas (from [H [3]) that we will need for our proofs. 

Definition 2.2.3. Given a distribution fi over [n], and M C [n] such that [J,(M) > 0, the 
restriction \i \m is the distribution over M with /x \m 00 = n{i)/^{M) (this is the the same as 
the conditioning of \i on B, only here we also change the domain). 

Given a partition M. = {Mq, M±, . . . , M^} of [n], we denote by ^^m) the distribution over 
{0} U [k] in which ii(M){i) = /x(Mj). This is the coarsening of /j according to M.. 

Definition 2.2.4. Given an explicit distribution /i over [n], Bucket(fi,[n],e) is a procedure 
that generates a partition {Mq,M±, . . . of the domain [n], where k = lo g°f" e ) < | log(ra). 

This partition satisfies the following conditions: 

• M = {j G [n] | /i(i) < i}; 

. for all i G [k], Mi = {j G [n] \ £±^1 < ^(j) < ii±^}. 

Lemma 2.2.5 (Lemma 8 in [3]). Let n be a distribution over [n] and let {Mq, M±, . . . , M^} 
Bucket(fi, [n],e). Then for all i G [A;], \\fi |"a/, ; —U U/JIoo < e/n. 

Lemma 2.2.6 (Lemma 6 in [3]). Let fj,, /j' be two distributions over [n] and let the sequence 
of sets M = {Mo, Mi, . . . , Mf,} be a partition of [n]. If ||/i —^ Ia/J|i — e i f or every 
i G [k] and \\(i/M) ~ M(JU)l|l — e 2, then ||/i — fjf\\i < ei +£2- Furthermore, \\fJb — fjf\\i < 
Eo<i<fcM( M i)ll/ i ^ filfJI +£2- 
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We reproduce the proof to obtain the "furthermore" claim: 
Proof. This results from the following. 

ii/*- Miii = E EM')-m'Cj)i= E Ei^v^cj)-/*'WtM 4 (j)i 

0<i<kjeMi 0<i<kj&Mi 
0<i<kjeAIi 

+ E E ImWW (i) - /A^w (j)i 

0<i<fc jeMi 

= E E^)i/ i ^(-?')-^u/ I (i)i+ E EmT^wkmo-^moi 

0<i<kjeMi 0<i<kj£Mi 

= E E IIa* (i) -// 0") lli + E Kmo-^cmoi 

0<i<fc jeM, 0<i<fe 

< E mmo E Hm (i) - (i)iii + £2 

This provides the "furthermore" claim. To obtain from the above the original claim note that 
EiKi^fcMMOEjeAfJ/* tWi (i) tM; (j)lli < Eo<i<fcM( M 0ei =d- □ 

3 Adaptive testing for uniformity and identity 

In the following we formulate our testing algorithms to have a polynomial dependence on 
log(5 _1 ). To make it linear we can first run the algorithm 1001og(<5 -1 ) times with a fixed 
| error bound and then take the majority vote. 

3.1 Testing for uniformity 

Theorem 3.1.1. There is an (adaptive) algorithm testing uniformity using poly(e , log(# -1 )) 
conditional samples independently of n. 

In fact we will prove something slightly stronger, which will prove useful in next sections: 

Theorem 3.1.2 (Near Uniformity Tester). Let \i be a known distribution over [n] such that 
WfJ- — UnWoo < -jfijfo- Identity with \i can be tested using only poly(e _1 , log(5 -1 )) conditional 
samples by an adaptive algorithm. 

Let \J be the unknown distribution that is to be sampled from. 

Algorithm 3.1.3. (Near Uniformity Tester) The algorithm receives fi,e,S and n and operates 
as follows. 

1. Take S to be k = (6/e) log(<5 -1 ) independent samples according to // (unconditioned). 

2. Take U to be k members of {1, . . . , n} chosen uniformly at random. 

2 

3. Invoke the Identity Tester of Theorem 12.2.11 to check whether fi \uus is 600 i g(^-i) " c l° se 
t° M \uuS over U U S with bounded error probability 5/3, and answer as the tester did. 

Lemma 3.1.4. The sample complexity of Algorithm \3. 7T3 is poly(e , log(5 -1 )). 
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Proof. The algorithm draws k samples, and then invokes the closeness tester on a set of size 2k 
and an error parameter polynomial in e _1 . Since the sample complexity of the closeness tester 
is polynomial in the support size and error parameter, and k = (6/e) log(<5~ 1 ), the total sample 
complexity of Algorithm 13.1.31 is poly(e~ 1 , log(5 -1 )). □ 

Lemma 3.1.5. If = then Alaorithm \3.1.3\ accepts with probability at least 1 — 5. 

Proof. If \\fi — fj,'\\\ = then \\fi \jjuS \uus\\i = and then the algorithm will accept if the 
closeness tester does, which will happen with probability at least 1 — |. □ 

Let the individual probabilities for the distribution /u be denoted by pi,...,p n and the 
probabilities for the distribution // denoted by p± , . . . , p' n . We first note that 

n 

2d(ji, //) = iim - //'Hi = Ik - p'i\ = 2 (p* - pi) = 2 & - pi) 

i=i Pi<Pi Pi>Pi 

Assume from now on that this distance is at least 2e (which corresponds to variation distance 
at least e). 

Lemma 3.1.6. With probability at least 1 — 5/3 we have an i £ S for which (p[ — pi) > 
Proof. Clearly E Pi <p^< Pi+e /2n(P / i ~ Pi) < \ e - Therefore: 

Pi > Y (P'i ~Pi)= £ & ~Pi)~ & ~ P*) > \ e 

Pi>Pi+£/2n p'i>Pi+e/2n p[>Pi p i <p' i <pi+e/2n 

This means that after (6/e) log(J _1 ) samples, with probability at least 1 — 5/3 we will get 
an i with such a p\ into S. □ 

Lemma 3.1.7. With probability at least 1 — 5/3 we have an i G U for which p\ < p^. 

Proof. Note that I] p / <p .(pi - p'j) < \{i ■ p\ < Pi}\ • maxfe}. Now since maxj{pi} < (1 + y^)^ 
there are at least (e/2)n such i. A uniformly random choice of (6/e) log(5 _1 ) indexes will get 
one with probability at least 1 — 5/3. □ 

Lemma 3.1.8. When both events above occur, /t \uus is at least 60 oiog($-i) ~f ar f rom I 1 \uus 
over U U S. 



Proof. Note that IS' LIE/] = 2k = 2 • (6/e) log(<5 _1 ), and that the two events above mean that 

+t/2 i 
-e/100Pj'- 

l+g/2 



there are % and j in this set such that p[ > j^n^p'j- Denoting the conditional probabilities 



Qi = Pi/ ^{S U U) and <^ = ^///(S 1 U E/), we note that we obtain q\ > i +e % 00 Qjj while both 
qi and gj are bounded between ppwjjjj] ^jg an< ^ i-e/ioo 21- • Therefore, either q\ > qi + ^ or 
1j < U ~ I5P Eitrier wa Y! ^(A 4 tt/us 1 ,^' \uus) > CTfe> which concludes the proof. □ 

This concludes the soundness proof, as the last step of the algorithm checks the closeness of 
A 4 ' \uus to /U |VuS with this approximation parameter. Thus we obtain: 

Lemma 3.1.9. Let [i be a known distribution over [n]. Then if \\fi— C/n||oo < an d 
d(fj,,fjf) > e £/ien Alaorithm \3.1.3\ rejects with probability at least 1 — 5. 

Proof. Follows from a union bound for the events of Lemma 13.1.61 and Lemma 13.1.71 and the 
failure probability of the test invoked in the last step of the algorithm. □ 
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3.2 Testing identity to a known distribution 

Recall that if we define log^(n) = n and by induction log^ fc+1 ^(n) = log(log^ fc ^(n)), then the 
log* function is defined by log*(n) = min{£; : log^^n) < f}. 

Theorem 3.2.1. Testing identity with a known distribution can be done by an adaptive algo- 
rithm using poly(log* n, e , log(<5 -1 )) conditional samples. 

Let \jl be the known distribution and jj! be the unknown distribution that is accessed by 
sampling. The following is an algorithm for testing identity to the known distribution \x over 
[n]. In the initial run we feed it m = n, but in the recursive runs it keeps track of m as the 
"original n". 

Algorithm 3.2.2. (Identity Test) The algorithm receives e, 5, n, m and fi, operating as follows. 

0. If n < ^ 4001 °s( 1 / ,: ) \ g* rrij then perform a brute-force test: Take 100 log(l/<5)e -2 n 2 log n 

samples to write a distribution p, that is |-close to // (with probability 1 — 5); if d(fi, fi) < | 
then ACCEPT and otherwise REJECT. 

1. Let M = {M , Mi, . . . ,M k } <- Bucket^, [n], 2001 ^ m ). 

2. Sample r = 4e _1 log*(m) log(<5 _1 ) elements from //. Let Mj 1 ,...,Mj r be the buckets 
where these elements lie. 

3. For every bucket , . . . , Mj r test using the Near Uniformity Test (Theorem I3.1.2P 
whether llu \ M . —a' \m- 111 > m e * = with error bound ^ — wrMi — ttttt- 

m i Jw ij r~ uwjjIIJ- — ilog m 121og (m)log(o l ) 

4. If for any ij we have \\fjL fj^. tiWi. Ill > 2 log* m then REJECT. 

5. Else recursively test if ~~ m'(A4)IIi — 6 (l ~~ log* m ) w ^h error bound |. If not then 
REJECT else ACCEPT. 

First, we bounds the number of recursion levels that can occur. 

Lemma 3.2.3. Algorithm \3.2.2\ never enters more than 21og*(ra) recursion levels from the 
initial n = m call. 

Proof. Note that in the first 2 log* (n) recursion levels distance parameter that is passed is still 

/ i \21og*(n) 

at least e ( 1 — log + w J > ^,sowe will prove the bound on the number of levels even if this 

is the distance parameter that is used in all but the first level. If log(ra) < ( 400 l °^ 1 ' e > log* m 



then after at most one recursion level the test goes to the brute force procedure in Step and 
ends. Otherwise, note that the recursive call now receives n' < 400e lo gW lQ g ( m ) < l g 3 (ri), 
and that call itself will make a recursive call with n" < 1200e lo g lo g( n ) lo g ( m ) < ^og n ( umess ft 
already terminated for some other reason). This is sufficient for the bound. □ 

Lemma 3.2.4. If d(p, //) = then Alaorithm \3.2.2\ accepts with probability at least 1 — 8. 

Proof. The base case where n < ( 400 ^sC 1 / 6 ) i g* j s clear. Otherwise, if — //|h = then 



for all buckets M, we have \\fj, — fJ U/Jli = and ||/U(^vi) — m'(.m)IIi = 0- From Lemma 

I2.2.5l we know that ||/x \ M . -U ImJoc < 200 log* m ' n - l^n' wnere e' is the distance parameter 
fed to the Near Uniformity Tester, and hence the Near Uniformity tester (Theorem 13. 1 .2[) is 
applicable and will accept with probability 1 — 12 io g *(m) logfa- 1 ) • Taking the union bound over 
the number of samples taken and the probability of failure for the recursive call gives us the 
desired bound. □ 
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For soundness we need the following lemma. 
Lemma 3.2.5. If \\fj, — > e then for any t at least one of the following two will happen: 

1- T, {l :\ M M^ hhh >e/2t}KM t ) > e/2t 

\\V{M) -V-'{M)h > e (! - 1/*) 
Proof. Recall Lemma 12.2.61 

- m'IIi < Yl v(Mi) II A* \Mt U) - m' I'm, (i)lli + llM(A4> - m'(m>IIi 

0<i<k jeA-h 

Thus if \\n {M) - fi\ M) \\i < e(l-l/t) and Yi{t:y\ Mi -^\ Mi \\ 1 > e /2t} < e / 2t then we have 

||// — jJ ||i < e, a contradiction. □ 

Lemma 3.2.6. Ifd(p,fi') > e i/ien Algorithm \,°3.1.,°fi rejects with probability at least 1 — 5. 
Proof. The base case g* m) is clear. Refer now to Lemma f3.2.5l taking t = 



log* m. Assume that we are in the first case of the lemma, that is E{i-|[ /U f M .- i u'r M .|| 1 >e/2t} l^iMi) — 
e/2t. therefore, the probability of sampling an index for which the test in Line [3] should reject 
is at least 2 lo g* — . This implies that the probability that one of the sampled elements is such is 
at least 5/3, and since the probability that all calls to the Near Uniformity Test fail is at most 
5/3 as well, we accept with probability at most 25/3. 

Now assuming that we are in the second case of Lemma 13.2.51 by the induction hypothesis 
we reject with probability at least 5/3. Thus the overall error probability is at most 5. □ 

Lemma 3.2.7. The sample complexity of Algorithm \3.27B is poly(log* n, e , log(5 -1 )). 
Proof. If n < /' 4001o g( 1 / £ ) log*?^ then it is polynomial in e and log*m, and so is the result 



of substituting it in the number of queries of the brute force check of Step 0, qb(e,5,n) = 
1001og(l/<5)e~ 2 n 2 logn. For analyzing the sample complexity when the above does not hold for 
m = n, let q(e, 5, n) denote the sample complexity of the algorithm. By the algorithm's defini- 
tion, we have the following formula, where q u is the sample complexity of the Near Uniformity 
Tester: 



q(e, 5, n,m) < 4e 1 log*(m) log(5 1 )fl + g u 



5e 

, n 



21og*m' 12 log* (m)log(^- 1 ) 



/ / 1 \ 5 400 log (n) log* (m) 

+9 K e ' m . 

According to Lemma 13.2.31 after at most 21og*n recursion levels from the initial n = m, 
the right hand side is now within the realm of the brute force check, and we get a summand 

bounded by q b (e/e 2 ,5 ■ 3- 21o s* n , ( ^ooiog(iA) l og *ra) 3 ) = poly(log* n, e -1 , log(r Therefore: 

/ / £ e . $ . 3-2 log* n \\ 

q(e,5,n,n) < Be" (log* n) 2 log^ 1 ) {l + q u (^-^ 40e 2 (log* n) 2 log(^) ' "J J 

+poly(log*n,e- 1 ,log(<T 1 )) 

Since by Lemma 13.1.41 the Near Uniformity Tester has sample complexity polynomial in 
the distance parameter and polylogarithmic in the error bound, we obtain the statement of the 
lemma. □ 
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4 Testing any label-invariant property 



We show here the following "universal testing" theorem for label- invariant properties. 

Theorem 4.0.1. Every label-invariant property of distributions can be tested adaptively using 
at most poly(log n, e -1 , log(<5 -1 )) conditional samples. 

It is in fact a direct corollary of the following learning result. 

Theorem 4.0.2. There exist an algorithm that uses poly(log n, e _1 , log(5 -1 )) adaptive condi- 
tional samples to output a distribution fi over [n], so that with probability at least 1 — 5 some 
permutation of ft will be e-close to ji. 

To derive Theorem 14.0.11 use Theorem 14.0.21 to obtain a distribution fi that is e/2-close to 
a permutation of [i, and then accept fi if and only if fi is e/2-close to the tested property. 

The main idea of the proof of Theorem l4.0.2l is to use a bucketing, and try to approximate the 
number of members of every bucket, which allows us to construct an approximate distribution. 
However, there are some roadblocks, and in the foremost the fact that we cannot really query 
the value Instead we will construct a way to approximate the distribution, and then go 

further to simulate the approximated distribution instead of the original. 

In all the following we assume that n is a power of 2, as otherwise we can "pad" the 
probability space with additional zero-probability members. 

4.1 Bucketing and approximations 

We need a bucketing that also goes into smaller probabilities than those needed for the other 
sections. 

Definition 4.1.1. Given an explicit distribution [i over [n], Bucket' (/j,, [n], e) is a procedure 
that generates a partition {Mq,M±, . . . ,M&} of the domain [n], where k = l °f™J°^ e ^ ^ ■ This 
partition satisfies the following conditions: 

• Mo = {j e N | nU) < 

. for all i€[k],Mi = [j E [n] \ ^f^e < y.(J) < ^e}. 

In the rest of this section, bucketing will always refer to this version. Also, from here on we 
fix e and k = ^^pTfjp^ ^ as above (as well as mostly ignore floor and ceiling signs). We also 

assume that e is small enough, say smaller than 

Suppose that we have mo, • • • , m^, where mi = |Mj| is the size of the i'th set in the bucketing 
of a distribution /i. Then we can use these to construct a distribution that is guaranteed to be 
close to some permutation of /i. 

Definition 4.1.2. Given mo, • • • for which X]j=o m i = n an< ^ e > ^ ne tentative distribution 
over [n] is the one constructed according to the following. 

• Set ri, . . . , r n so that \{i : ri = 0}| = mo and \{i : ri = ^ 1+ ^ J — e}| = mj for every 1 < j < k 
(the order of n, . . . ,r n is arbitrary). 

• Set a distribution fi over [n] by setting fi(i) equal to r»/ Y^j=i r «- 
To gain some intuition, note the following. 
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Observation 4.1.3. IfMo,...,Mk is the bucketing of fx and fx is the tentative distribution 
according to niQ = \Mq\, . . . , m& = \M^\, then fx is 2e-close to some permutation of fx. 

Proof. We assume that we have already permuted fx so that each fx(i) refers to an n set according 
to the bucket Mj satisfying i 6 Mj (such a permutation is possible because here we used the 
actual sizes of the buckets). 

We recall that the distance is in particular equal to Z){r/i(i)<;u(i)}(/ i W ~~ Referring to 

the Ti of the definition above, we note that in this case X^=o r * — Ya=o = 1 an< ^ hence 
fj-if) > r i- For i Mo, this means that fi(i) > (1 — e)fx(i). For the rest we just note that 
SieMo M*) — e ' Together we get the required bound. □ 

The above observation essentially states that it is enough to find the numbers mo, • • • , m& 
associated with fx. However, the best we can hope for is to somehow estimate the size, or total 
probability, of every bucket. The following shows that this is in fact sufficient. 

Definition 4.1.4. Given ao,...,ctk for which Ylj=o a j = 1) the bucketization thereof is the 
sequence of integers mo, • • • , mk defined by the following. 

• For any 1 < j < k let rhj be the integer closest to na^ (where an "exact half" is arbitrarily 
rounded down). 

• If Y^=i > n -> then decrease the rhj until they sum up to n, each time picking j to be 
the smallest index for which rhj > and decreasing that quantity by 1. 

• Finally set mo = n — Y^=i "V 

We say that the bucketization has failed if in the second step we had to decrease any rhj for 
which ^±^e > f • 

Lemma 4.1.5. Suppose that mo, . . . , m^, cto, . . . , are such that : 
j= mj = n 

.£^™^ e <i 

• EjU «j = 1 

• \rrij - aj|^-t^ — e < ^ for all 1 < j < k 

and let rho, . . . , m^ be the bucketization of cto, ■ ■ ■ > a k- Then mo, • • • , m^ are all well defined (the 
bucketization process did not fail), and additionally if fx is the tentative distribution according 
to mo,...,mfc and fx is the tentative distribution according to rho, ■ ■ ■ ,rhk, then the distance 
between fx and fx (after some permutation) is at most 4e. 

Proof. First thing to note is that rrij = rhj for all j for which ( - 1+ ^ J — e > |, before the decreasing 
step, so there will be no need to decrease these values and the bucketization will not fail. 

For all other j > 1, before decreasing some of the rhj we have that \rrij — rhj | < ~ 1+ ^ J — e < | 

(if ( - 1+ ^ J — e < | then the distance is not more than doubled by the rounding, and otherwise 
it follows from \ctj — rhj\ < 1). Since the bucketization did not fail, the decreasing step only 

affects values rhj for which ^ 1+e ^ — e < -|, and the total required decrease in them was by not 
more than k (as the rounding in the first step of the bucketization added no more than 1 to 

each value), we obtain the total bound £? =1 \mj — rhj \ ^ 1+e ^ — e < 3e. 
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Let ri denote the corresponding values in the definition of fx being the tentative distribution 
according to too, . . . ,irik, and fj be the analog values in the definition of ft being the tentative 
distribution according to mo, . . . , to^. By what we already know about Ylj=i \ m j ~ "^il^^T — 
we have in particular Y17=i ^ = Y17=i r * ^ Combined with the known bounds on Y^l=\ r «i 
we can conclude by finding a permutation for which we can bound Y17=i \ r i ~ by 3e, which 
will give the 4e bound on the distribution distance \ Y17=i I AW ~~ 

The permutation we take is the one that maximizes the number of z's for which rj = fjj for 
the value ( - 1+ ^ J — e we can find minjTOj, to^} such i's (for every 1 < j < k), and the hypothetical 
worst case is that whenever ri / fj one of them is zero (sometimes the realizable worst case is 

in fact not as bad as the hypothetical one). Thus we obtain the Ylj=i \ m j ~ ^il^TT — 6 — ^ e 
bound leading to the 4e bound on the distribution distance. □ 

A problem still remains, in that sampling from fx will not obtain a value ctj close enough to 

the required rrij ^ 1+ ^ J — e. The variations in the fx(i) inside the bucket Mj itself could be higher 
than the ^ that we need here. In the next subsection we will construct not only a "bucket 
identifying" oracle, but tie it with a sampler that will simulate the approximate distribution 
rather than the original fx. 



4.2 Ratio trees and reconstituted distributions 

The main driving force in our algorithm is a way to estimate the ratio between the distribution 
weight of two disjoint sets. To make it into a weight oracle for a value i G [n], we will use 
successive partitions of [n], through a fixed binary tree. Remember that here n is assumed to 
be a power of 2. 

We first define how to "reconstruct" a distribution from a tree with ratios, and afterward 
show how to put the ratios there. 

Definition 4.2.1. Let T be a (full) balanced binary tree with n leaves labeled by [n]. Let U 
be the set of non-leaf nodes of the tree, and assume that we have a function a : U — > [0,1]. For 
u £ U denote by L(u) the set of leaves that are descendants of the left child of u, and by R{u) 
the leaves that are descendants of the right child of u. 

The reconstituted distribution according to a is the distribution fi that is calculated for every 
i G [n] as follows: 

• Let ui, . . . , ui og ( n ) +1 be the root to leaf path for i (so in particular tii og ( ra )+i = i)- 

• For ever 1 < j < logn, set pj = a(uj) if i is a descendant of the left child of Uj (that is if 
i G L(uj)), and otherwise set p j = 1 — a(uj). 

. Set fi(i) = U^iPr 

For intuition, note the following trivial observation. 

Observation 4.2.2. If for a distribution fi we set a(u) = ^L^yl^Riu)) > us ^ n 9 an arbitrary 
value (say \) for the case where fi(L(u)) + /j,(R(u)) = 0, then the reconstituted distribution fi is 
identical to \i. 

However, we cannot know the values ^l(^))+1!{r{u)) • ^ ne ^ es * we can ^° * ne * ne following. 

Definition 4.2.3. An (e, 5) -ratio estimator for T and a distribution \i is an algorithm A that 
given a non-leaf vertex u G U outputs a number r, such that with probability 1 — S we have 
thnt »(L(v)) r < r < M^W) I , 
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Algorithm 4.2.4. (Ratio Estimator) The algorithm is given a balanced binary tree T with n 
leaves, a non-leaf vertex u G U and parameters e, 5. It also has conditional sample access to a 
distribution \i. 

1. Sample t = 2e -2 log(<5 -1 ) elements according to \x \l(u)ur(u)> an d let s be the number of 
samples that are in L{u). 

2. Return the ratio | of the samples that are in L(u) to the total number of samples. 

Lemma 4.2.5. For any e,6 Algorithm ^. 2. 4\ is an (e, 5) -ratio estimator for T and \i which uses 
t = 2e _2 log(5 _1 ) non-adaptive conditional samples from \x. 

Proof. The number of samples used is immediate. Let us now proceed to show that this is 
indeed an (e, 5)-ratio estimator. The expected value of | is ^^u))+^(r( u )) • 

By Chernoff's inequality, the probability that f deviates from its expected value by an 
additive term of more than e is at most 2exp(— 2e 2 • t). By our choice of t we obtain the 
statement. □ 

If we could "populate" the entire tree T (through the function a) by values that do not 
deviate by much from the corresponding ratios, then we would be able to create an estimate 
for fj, that is good for most values. 

Definition 4.2.6. The function a : U — > [0, 1] is called e-fine if \a(u) — ^(^yl^Riu)) I — 
(2TH§r) 2 for every uG C/. 

We call a distribution fl e-fine if there exists a set B such that n(B) < e, and additionally 
fl(i) = (1 ± e)fi(i) for every i £ [n] \ B. 

Lemma 4.2.7. If a is e-fine then the reconstituted distribution fl is e-fine. 

Proof. To define the set B, for every i consider the pi, ■ ■ ■ ,pi Q gn that are set as per Definition 
14.2. 1 1 and set i G B if and only if there exist some pj that is smaller than 2 i gfa) • Next, 

denote by q±, . . . , qk the "intended" values, that is qj = ^L^y)+/l(R(u ■)) ^ * e ^( u j) an d Qj = 
fj.(L(u^))+il(R(u ■ )) otherwise. Noting that pj does not deviates from qj by more than ( 21o g( n ) ) 2 ; 
an induction over logn (the height of T) gives that 1 — fJ,(B) is at least (1 — ^|^;) log?1 > 1 — e. 
For i G [n] \ B, we note that in this case pj = (1 ± 21( ^ gn )gj, and hence jj(i) = YOjiiPj = 

(i±2i^) losri n; o = s i n 9i = (i± e Hi). □ 

We should note here that it is not hard to prove that an e-fine distribution fi is of distance 
not more than 4e from the original [i. However, we will in fact refer to yet another distribution 
which will be easier to estimate, so we will show closeness to it instead. 

Definition 4.2.8. Given an e-fine distribution fl and its respective set B, its e-trimmed distri- 
bution JZ is a distribution over [n] U {0} defined by the following. 

• For i G B U {i : fi(i) < ^} we set = 0. For such i we also set ji = 0. 

• For all other i G [n] we set j{ to be the largest integer for which - — e < fl(i), and set 

= —!i — e - 

• Finally set ~p(0) = 1 — Y^i=iV'i})\ n ote that ~p(i) < fl(i) for all 1 < i < n and hence 
71(0) > 0. 
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The e-renormalized distribution Jl over [n] is just the conditioning [i |"r n i. 

It is important to know that the renormalized distribution is in fact (a permutation of) the 
tentative distribution according to too, . . . , m^, where for < j < k we set rrij = \{i : ji = ill- 
Lemma 4.2.9. The renormalized distribution Ji corresponding to an e-fine distribution Jl is 
4e-close to \x. 

Proof. First we consider the trimmed distribution Jl, and its distance from fi (when we extend it 
by setting /z(0) = 0). Recalling that this variation distance is equal to Y2{iqx(i)<ij,(i)} 
we partition the set of relevant i's into two subsets. 

• For those i that are in B (for which ~fi(i) = 0), the total difference is li(B) < e. 

• For any other % for which ~p,(i) < fi(i), note that ~fi(i) > j^Jt(i) > j^^(i) > (1 — 3e)/j,(i). 
This means that the sum over differences for all such i is bounded by 3e. 

• We never have /l(0) < jl(0). 

Thus the distance between Ji and jl is not more than 4e. As for Ji, the sum of differences over i 
for which Ji(i) < fx(i) is only made smaller (the conditioning only increases the probability for 
every i > 0), and so the 4e bound remains. □ 



4.3 Distribution samplers and learning 

For our learning algorithm we need to not only sample from the distribution /x, but to be able to 
"report" /i(i) for every i thus sampled. This we cannot do, but it turns out that we can sample 
from a close distribution Jl while reporting Ji{i). In fact we will sample from a distribution that 
in itself will be drawn from the following distribution over distributions. 

Definition 4.3.1. The (e, 5) -condensation of \x is the distribution over e-fine distributions (with 
respect to \i) that is defined by the following process. 

• Let T be a (full) balanced binary tree whose leaves are labeled by [n], and U be its set of 
internal nodes. 

• For every u E U, let a(u) be the (randomized) result of running the corresponding 
(( 2 \ g(n) ) 2 ) ^)~R a tio Estimator (Algorithm 14.2.4]) . when conditioned on this result indeed 

being of distance not more than ( 2k) g( w ) ) 2 from ^i^y)+/l(R(u ■)) • This is done indepen- 
dently for every u. 

• The drawn distribution Jl is the reconstituted distribution according to T and a 

The algorithm that we define next is an explicit persistent sampler: It is explicit in that 
it relays information about J(i) along with i, and persistent in that it simulates (with high 
probability) a sequence of s independent samples from the same Jl. 

Definition 4.3.2. Given a distribution over distributions, a (5, s)- explicit persistent sampler 
is an algorithm that can be run up to s times (and during each run may store information to 
be used in subsequent runs), that in every run returns a pair (i, rj). It must satisfy that with 
probability at least 1 — 5, the i's for all s runs are independent samples of a single distribution 
Ji that in itself was drawn according to the distribution over distributions, and every output 
pair [i,rj) satisfies rj = Ji(i). 
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Algorithm 4.3.3. (Persistent Sampler) The algorithm is given parameters e, 5 and s, and has 
conditional sample access to a distribution \i. 



1. On the initial run, set T to be a full balanced binary tree with n leaves labeled by [n]. Let 
w denote the root vertex and U denote the set of non-leaf vertices, a is initially unset. 

2. On all runs, set u\ = w, and repeat the following for 1 = 1,..., log n. 

(a) If a(ui) is not set yet, set it to the result of the (( 21o ^ ra ^ ) 2 , sl( ^ gn )-Ratio Estimator 
( Algorithm 14.2.4]) ; run it independently of prior runs. 

(b) Independently of any prior choices, and without sampling from (i, with probability 
a(ui) set ui + i to be the left child of ui and pi = a(u{), and with probability 1 — a(ui) 
set ui+i to be the right child of u\ and pi = 1 — a(ui). 

(c) Set i to be the label of the leaf u\ ogn and rj = Y\i=i Pi- Return i and rj. 



Lemma 4.3.4. For any e,5 and s, Algorithm \ J^.S.S is a (6, s) -explicit persistent sampler for 
the (e, - ^ - ) - condensation of /i. It uses a total of 2 5 • e -4 log 5 n ■ log(s<5 _1 log n) many adaptive 
conditional samples from \x to output a sample. 

Proof. The calculation of the number of samples is straightforward (but note that these are 
adaptive now). During s runs, by the union bound with probability at least 1 — 5 all of the calls 
to the (( 2 iog(ra) ) 2 ' s log n )~R a ti° Estimator produced results that are not more than (( 2 i g( n ) ) 2 ~ 
away from the actual rations. 

Conditioned on the above event, the algorithm acts the same as the algorithm that first 
chooses for every u G U the value a(u) according to a run of the (( 2 i g( TO ) ) 2 > siogn )"^'° 
Estimator conditioned on it being successful, and only then traverses the tree T for every 
required sample. The latter algorithm is identical to picking a distribution fi according to the 
(e, - ^ - )-condensation of \i, and then (explicitly) sampling from it. □ 

This is almost sufficient to learn the distribution. The next step would be to estimate the 
size of a bucket of the e-fine distribution fl by explicit sampling (i.e. getting the samples along 
with their probabilities). However, Lemma [4 . 1 . 5 1 r eq uir es an approximation not of ft(Mj) (where 

Mj is a bucket of p) but rather of \Mj \ — e. In other words, we really need to approximate 
jj(Mj), where JL is the corresponding trimmed distribution. 

Therefore we define the following explicit sampler for an e-trimmed distribution. We "bend" 
the definition a little, as this sampler will not be able to provide the corresponding probability 
for i = 0. 

Algorithm 4.3.5. (Trimming Sampler) The algorithm is given parameters e,S and s, and has 
conditional sample access to a distribution \i. 

1. Run the Persistent Sampler (Algorithm 14.3.3]) with parameters e,5 and s to obtain i and 
77; additionally retain pi, ■ ■ ■ ,P\ogn as calculated during the run of the Persistent Sampler. 

2. If there exists I for which pi < n1 % ? then return "0". 

3. If rj < % then return "0". 

4. Otherwise, let j be the largest integer for which — ^ — e < 77, and set rj = — e. 

5. With probability 1 — rj jr\ return "0", and with probability rj /rj return (i,j) (where j 
corresponds to ]X(i) = rj). 
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The following observation is now easy. 



Observation 4.3.6. The trimming sampler (Algorithm \4-ci.5\) is a (5, s) -persistent sampler, 



and explicit whenever the returned sample is not 0, for the distribution over distributions that 
results from taking the e-trimming of an e-fine distribution ft and its corresponding B that was 
drawn according to the (e, - ^ - ) - condensation of a. The algorithm uses in total 2 5 • e" 4 log 5 n ■ 
log(s5 _1 logn) many adaptive conditional samples from \i to output a sample. 

Proof. The number of samples is inherited from Algorithm 14.3.31 as no other samples are taken. 
The algorithm switches the return value to "0" whenever i £ B (as defined in the proof of 
Lemma I4.2.7p . and otherwise returns "0" exactly according to the corresponding conditional 
probability difference for i between fx (as in the definition of a reconstituted distribution) and /Z 
(as in the definition of the corresponding trimmed distribution) . Finally, whenever the returned 
sample is i > the algorithm clearly returns the corresponding f% (see Definition I4.2.8P . □ 

We are now ready to present the algorithm providing Theorem 14.0.21 

Algorithm 4.3.7. (Distribution Approximation) The algorithm is given parameters e,8, and 
has conditional sample access to a distribution \i. 

1. Set s = 2 12 e -4 log 2 (n) log(<5 _1 ), and k = ^fgs7jj^~7^j ' (the number of buckets in an e/8- 
bucketing of a distribution over [n]). 

2. Take s samples through the (e/8, 5/2, s)-Trimming Sampler. 

3. Denote by sq the number of times that the sampler returned "0", and for 1 < j < k 
denote by Sj the number of times that the sampler returned for any i. 

4. Let m' , . . . , m' k be the bucketization of ao = . . . , ajt = 

5. Return the tentative distribution according to m , . . . , m' k . 



Lemma 4.3.8. The Distribution Approximation algorithm (Algorithm \4-3.7 ) will with proba 



bility at least 1 — 5 return a distribution that is e-close to a permutation of ll. This is performed 
using at most 0(e~ 8 log 7 n log 2 (<5 -1 )) conditional samples. 

Proof. The number of samples is immediate from the algorithm statement and Observation 

By Observation 14. 3. (Jl with probability at least 1 — 5/2 all samples of the Trimming Sampler 
will be from one e/8-trimming of some e/8-fine distribution /L Set mo = |{1 < i < n : ~p(i) = i}\ 
and for 1 < j < k set rrij = \{i : Jl(i) = ^ 1+ ^ J — e}|. Recall that the e/8-renormalized distribution 
corresponding to ~p, is in fact the tentative distribution according to mo, ■ ■ ■ ,77tfc. By Lemma 
I4.2.9| this distribution is e/2-close to [i. 

Note now that for every 1 < j < k the expectation of aij is exactly rrij ( 1+e /^) 3 — e /8. By virtue 
of a Chernoff bound and the union bound, our choice of s implies that with probability 1 — 5/2 

(conditioned on the previous event) we in fact get values that satisfy \rrij— aj \ ^ 1+e ^ — e/8 < ^ 
for every 1 < j < k. This satisfies the assertions of Lemma 14.1.51 and thus the tentative 
distribution according to m , ■ ■ ■ , ml will be e/2-close to the tentative distribution according to 
mo, • • • , ink, and hence will be e-close to [a. □ 

Note that if we were to use this algorithm for testing purposes, the dependence on S^ 1 can 
be made logarithmic by setting it to 1/3 and repeating the algorithm log(5~ 1 ) times, taking 
majority. 
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5 Non-adaptive testing for uniformity and identity 

In this section we return to the definition of bucketing introduced in the preliminaries (Definition 
5.1 Testing uniformity 

Theorem 5.1.1. Testing uniformity can be done using poly(log n, e _1 , log(5~ 1 )) non-adaptive 
conditional samples. 

Again, we will actually prove the following stronger statement: 

Theorem 5.1.2 (Nonadaptive Near Uniformity Tester). Let \i be a known distribution over 
[n]. If Wfi — UfiWoo < e/8n then identity with \i can be tested using poly(log n, e _1 , log(5 -1 )) 
conditional samples by a non-adaptive algorithm. 

To simplify analysis and presentation, the algorithm will succeed with probability 2/3. This 
can be amplified to 1 — 8 by the standard technique of repeating it for log(<5 -1 ) times and 
taking the majority vote. This obviously incurs a multiplicative factor of log(5 _1 ) in the sample 
complexity. 

Algorithm 5.1.3. The algorithm is given n, e and and has nonadaptive conditional sample 
access to //. 

1. For [~log(28800e~ 6 log 5 (n))] < j < [log(n)], set Uj to be a uniformly random set of 
min{n, 2 J } indices. 

2. For every Uj, perform 16e -2 log 2 (n) conditional samples, and if the same index was drawn 
twice, REJECT. 

3. Uniformly pick a random set U of 1980e~ 6 log 5 (n) elements, and invoke the Identity Tester 
of Theorem 12.2.11 to test whether // \u= fi \u or d(fj,' \u,H \u) > 2^u] w ^h success 
probability 

4. ACCEPT unless any of the above testers rejected. 

Lemma 5.1.4. If d(fj,,fj!) = then Alaorithm \3.1.3\ accepts with probability at least 2/3. 

Proof. Since \\fj, — C/ n ||oo < e/8n, the probability that an element will be drawn twice in the jth 
iteration of Line[2]is at most ( 16e ^ g ^ n ^) • ^ j^/s ) ■2~ 2j '• Summation over all values of j gives 
us less than 1/9. 

Since \i = fj,', fx' \u= A 4 \u f° r an Y U C [n], and the probability that Line [3] rejects is at most 
1/9. This obtains the error bound in the lemma. □ 

The following is immediate from the algorithm statement and Theorem 12.2.11 

Lemma 5.1.5. The sample complexity of Algorithm \3.2J% is poly(log n, e _1 ). 

Proof. This follows from the number of samples used in Lines [2] and [3] and the fact that Line [2] 
is iterated at most logn times. □ 

In the following we assume that d(fj,,fjf) > e. 

Let Mi, M 2 , . . . , M k be the bucketing of fj, and M[, M' 2 , . . . , M' k the bucketing of fi' with e/3. 
Denote the individual probabilities by pi, . . . ,p n and p[, ■ ■ ■ ,p' n respectively. 
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Lemma 5.1.6. \ML U Mil > en and there exists 2 < j < k such that \M'A > , £ ," N j . 

1 U li — •> — '3' — 24(l+e/3) J logn 

Proof. Note that [n] = Mq U M\ by our requirement from \i. Now following Lemma 13.1.71 
^2 p ' <Pi (Pi -p'i) < \{i ■ p[ < Pi}\ ■ max{pj}. Now since maxjfe} < (1 + there are at least 

(e/2)n such i. 

For the second part we will adapt the proof of Lemma 13.1.61 Clearly X^ Pi <p'< Pi +iie/i2n(^ ~~ 
Pi) < Therefore: 

^ > Y ( P 'i ~Pi)= S ^ ~Pi)- (Pi ~ P^> > 

p(>Pi+lle/12n p^>ft+lle/12n p(> P j p i <p' i <p i +lle/12n 

Since pi > we know that the in the left hand side have (assuming e < 1/10) 

' > , lle _ 1 + 19e/24 > (l + e/3) 2 

n Yin n ~ n 

and therefore all these p[s are in buckets Mj for 2 < j < k. 

Since fc = i og (°+"/ 3 ) , there exists some 2 < j < k such that u'(M'-) = e ■ By the 
definition of the buckets this gives |Mj| > ■ > 24( i + ^io g n - □ 

3n H 

random will with probability more than Jr contain a member of B. 



Lemma 5.1.7. Given a set B of size I, a set U of min{n, ^} indices chosen uniformly at 
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Proof. The probability is lower bounded by the probability for 3n/l indexes chosen uniformly 



3n 

and independently with repetitions from [n] to intersect B, which isl — (1 — l/n)~r > □ 

Lemma 5.1.8. Let \i be a known distribution over [n]. If — C/ n ||oo < e/8n and d(fi,/jf) > e 
then Alaorithm \3.1.3\ rejects with probability at least 2/3. 

Proof. We partition into cases according to the j guaranteed by Lemma 15.1.61 

If (1 + §)•?' < 40e" 4 log 4 n, then |Mj| > [° g- n, so by Lemma EXT] with probability ±§ 

the set U in Line [3] will contain a member h of ML Note that j > 2 and therefore fi'(h) > 

^ 1+ t/ 3 ^ • the nrs ^ P ar t OI " Lemma 15.1.61 with probability ^§ (actually much more than that) 
we will also sample an element I S Mq U M{. Thus we have fj,'(h) > (1 + e/3)p'(l), and also 
A*' tc/ (fr) > (1 + e/3)// tc; (Z) 3 while both /x frj (/i) and \i \u (I) are restricted between | + ^g njr 
and ¥^TEW\- Therefore, either // \ v (h) > fi \u (h) + or // ^ (/) < \i \ v (I) - 
Either way d(u' tt/>M IV) > 2 4\u\ ' wri ich will be identified by the tester of Theorem 12.2.11 with 
probability Thus in total we get a rejection probability greater than |. 

Otherwise, let i be such that the value 2 l is between min{n, 720e~ 2 log n(l + |) J } and 
2min{n, 720e~ 2 log n(l + |) J } (recall the lower bound on (1 + §)■'). In that case the Ui in 
Line [2] will with probability at least g§ contain a member a of Mj . Additionally, the expected 

value of fJ>'(Ui) is min{l, ^} < min{l, i 4 ^e~ 2 (l + |) J logn}, thus by Markov's inequality, with 
probability at least | we will have n'{Ui) < min{l, 14 ^ 00 e " 2 (i + |) J logn}. Therefore, y! 

t 2 „, . , . , , „ . . . log n 



(a) > 14 4oo( 1 ^ e/ /3) iogn ' Thus the expected number of times a is sampled is at least and 
therefore by Lemma 12.2.21 with probability 1 — 2exp(— ^|^) we will sample a at least twice. 
Thus in total we get a rejection probability greater than X for n > 2 253 (this lower bound can 
be traded for a higher degree polynomial dependence on logn). □ 
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5.2 Testing identity to a known distribution 

Theorem 5.2.1. Identity to a known distribution can be tested using poly(logra, e _1 , log(5 -1 )) 
non-adaptive conditional samples. 

Let \jl be the known distribution and jJ be the unknown distribution that is accessed by 
sampling. The following is an algorithm for testing identity with the known distribution \i over 
[n\. 

Algorithm 5.2.2. (Identity Test) The algorithm receives e, 5, n and fi and operates as follows. 

1. Let M = {M , Mi, . . . ,M k } <- Bucket^!, [n], §). 

2. For each bucket Mi, . . . , M& test using the Nonadaptive Near Uniformity Test (Theorem 
I5.1.2p to check whether \mj — A*' \Mj\\i > e/2 with error bound - 1 °||o^ //8 ' > , rejecting 
immediatly if any test rejects. 

3. Invoke the Identity Tester of Theorem 12.2.11 to test if — m'(.M)IIi — w ith error 
bound 5/2, answering as the test does. 

Lemma 5.2.3. If d(p, //) = then Alaorithm \5.2.2\ accepts with probability at least 1 — 5. 

Proof. In this case, for all buckets \m 3 U/jlli = and — = ^, and thus 

by the union bound we obtain the statement. □ 

Lemma 5.2.4. The sample complexity of Algorithm \3.2TM is poly(log re, e , log(5 -1 )). 

Proof. We invoke the Nonadaptive Near Uniformity Test log |i^_"y 8 ^ times, and invoke the Close- 
ness Tester with a distribution of support size log |°^_"y 8 ^ ■ Therefore by Lemma f5.1.5l and Theorem 
12.2.11 we obtain the bound in the statement. □ 

Lemma 5.2.5. If d(fj,,fj!) > e, then Algorithm ] 5. 2.2\ rejects with probability at least 1 — 5. 

Proof. Assume that the test accepted. If no error was made, then by Lemma f2.2.6l we have that 
d(/j,,iJ,') < e. By the union bound the probability of error is at most 5. □ 

6 Lower bounds for label invariant properties 

In this section we prove two sample complexity lower bounds for testing label-invariant distri- 
bution properties in our model. The first is for testing uniformity, and applies to non-adaptive 
algorithms. The second bound is for testing whether a distribution is uniform over some subset 
U C {1, . . . ,n} of size exatcly 2 2k for some k, and applies to general (adaptive) algorithms. 

The analysis as it is written relies on the particular behavior of our model when conditioning 
on a set of probability zero, but this can be done away with: Instead of a distribution /x 
with probabilities p\, . . . ,p n over [n], we can replace it with the o(l)-close distribution fi with 
probabilities p\, . . . ,pi where p, = \ + (1 — -)p%- The same analysis of why an algorithm will 
fail to correctly respond to \x will pass on to fi, which has no zero probability sets. 
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6.1 Preliminary definitions 

We start with some definitions that are common to both lower bounds. 

First, an informal reminder of Yao's method for proving impossibility results for general 
randomized algorithms: Suppose that there is a fixed distribution over "positive" inputs (inputs 
that should be accepted) and a distribution over "negative" inputs, so that no deterministic 
algorithm of the prescribed type can distinguish between the two distributions. That is, suppose 
that for every such algorithm, the difference in the acceptance probability over both input 
distributions is o(l). This will mean that no randomized algorithm can distinguish between 
these distributions as well, and hence for every possible randomized algorithm there is a positive 
instance and a negative instance so that it cannot be correct for both of them. 

In our case an "input" is a distribution \x over {1, . . . , n}, and so a "distribution over inputs" 
is in fact a distribution over distributions. To see why a distribution over distributions cannot 
be replaced with just a single "averaged distribution", consider the following example. Assume 
that an algorithm takes two independent samples from a distribution fi over {1,2}. If /i is with 
probability ^ the distribution always giving 1, and with probability ^ the distribution always 
giving 2, then the two samples will be either (1,1) or (2,2), each with probability ^. This 
can never be the case if we had used a fixed distribution for rather than a distribution over 
distributions. 

What it means to be a deterministic version of our testers will be defined below; as with 
other settings, these result from fixing in advance the results of the coin tosses of the randomized 
testers. The following are the two distributions over distributions that we will use to prove lower 
bounds (and a third which will simply be "pick the uniform distribution over {1, . . . , n} with 
probability 1"). 

Definition 6.1.1. Given a set U C {1, ... , n}, we define the U -distribution to be the uniform 
distribution over U, that is we set pi = 1/\U\ if i 6 U and pi = otherwise. 
The even uniblock distribution over distributions is defined by the following: 

1. Uniformly choose an integer k such that | logn < k < | logn. 

2. Uniformly (from all possible such sets) pick a set U C {1, . . . , n} of size exactly 2 2k . 

3. The output distribution fi over {1, . . . ,n} is the [/-distribution (as defined above). 
The odd uniblock distribution over distributions is defined by the following: 

1. Uniformly choose an integer k such that | logn < k < | logn. 

2. Uniformly (from all possible such sets) pick a set U' C {1, . . . , n} of size exactly 2 2fc+1 . 

3. The output distribution fi over {1, . . . ,n} is the [/'-distribution. 

Finally, we also identify the uniform distribution as a distribution over distributions that 
picks with probability 1 the uniform distribution over {1, . . . ,n}. 

For these to be useful for Yao arguments, we first note their farness properties. 

Observation 6.1.2. Any distribution over {1, ... ,n} that may result from the even uniblock 
distribution over distributions is \ -far from the uniform distribution over {1, . . . , n}, as well as 
7} -far from any distribution that may result from the odd uniblock distribution over distributions. 

Proof. This follows directly from a variation distance calculation. Specifically, the variation 
distance between a uniform distribution over U and (a permutation of) a uniform distribution 
over V with \V\ > \U\ (which is minimized when we make the permutation such that U C V) 
is (|V| — |[/|)/|V|. In our case we always have \V\ > 2\U\, and hence the lower bound. □ 
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All throughout this section we consider properties that are label-invariant (such as the 
properties of being in the support of the distributions defined above). This allows us to simplify 
the analysis of our algorithms. 

First, some technical definitions. 

Definition 6.1.3. Given A±, . . . ,A r C {1, . . . , n}, the atoms generated by Ai, . . . ,A r are all 

sets of the type Dj=i Cj where every Cj is one of Aj or {1, . . . , n} \ Aj. In other words, these 
are the minimal (by containment) non-empty sets that can be created by boolean operations 
over Ai, . . . , A r . The family of all such atoms is called the partition generated by Ai, . . . ,A r ; 
when r = that partition includes the one set { 1 , . . . , n} . 

Given A±, . . . , A r and j±, . . . ,j r where ji £ Ai for all i, the r -configuration of j\, . . . ,j r is 
the information for any 1 < I, k < r of whether j\. 6 A\ (or equivalently, which is the atom that 
contains jk) and whether jk = j\. 

The label-invariance of all properties discussed in this section will allow us to "simplify" our 
algorithms prior to proving lower bounds. We next define a simplified version of a non-adaptive 
algorithm. 

Definition 6.1.4. A core non-adaptive distribution tester is a non-adaptive distribution tester, 
that in its last phase bases its decision to accept or reject only on the i(e)-configuration of its 
received samples and on its internal coin tosses. 

For a core non-adaptive tester, fixing the values of the internal "coins" in advance gives a 
very simple deterministic counterpart (for use in Yao arguments): The algorithm now consists 
of a sequence of fixed sets A\ , . . . , A t i e \ , followed by a function assigning to every possible 
t(e)-configuration a decision to accept or reject. 

We note that indeed in the non-adaptive setting we only need to analyze core algorithms: 

Observation 6.1.5. A non-adaptive testing algorithm for a label-invariant property can be 
converted to a corresponding core algorithm with the same sample complexity. 

Proof. We start with the original algorithm, but choose a uniformly random permutation a 
of {1, ... ,n} and have the algorithm act on the correspondingly permuted input distribution, 
rather than the original one. That is, every set Ai that the algorithm conditions on is converted 
to {a(k) : k G Ai}, while instead of ji the algorithm receives cr _1 (jj). This clearly preserves the 
guaranteed bounds on the error probability if the property is label-invariant. 

To conclude, note that due to the random permutation, all outcomes for ji,...,jt that 
satisfy a given configuration are equally likely, and hence can be simulated using internal coin 
tosses once the configuration itself is made known to the algorithm. □ 

For an adaptive algorithm, the definition will be more complex. In fact we will need to 
set aside some "external" coin tosses, so that also the "deterministic" counterpart will have a 
probabilistic element. But it will be a manageable one. 

Definition 6.1.6. A core adaptive distribution tester is an adaptive distribution tester, that 
acts as follows. 

• In the i'th phase, based only on the internal coin tosses and the configuration of the sets 
Ai, . . . , Ai-i and ji, ■ ■ ■ the algorithm assigns a number kA for every atom A that is 

generated by Ai, . . . , between and \A \ {j±, . . . , where not all such numbers 

are 0. Additionally the algorithm provides Ki C {!,... ,i— 1}. 
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• A set Bi C {l,...,n} \ {ji, . . . is drawn uniformly among all such sets whose 
intersection with every atom A as above is of size kA, and Ai is set to Bi U {jk ■ k £ K{\. 
The random draw is done independently of prior draws and the algorithm's own internal 
coins, and Ai is not revealed to the algorithm (however, the algorithm will be able to 
calculate the sizes of the atoms in the partition generated by A± . . . , Ai using the i — 1- 
configuration, and the numbers provided based on it and the internal coin tosses). 

• A sample ji is drawn according to \i conditioned over Ai, independently of all other draws. 
ji is not revealed to the algorithm, but the new ^-configuration is revealed (in other words, 
the new information that the algorithm receives is whether ji £ Ak and whether ji = jk 
for each k < i). 

• After t(e) such phases, the algorithm bases its decision to accept or reject only on the 
^-configuration of its received samples and on its internal coin tosses. 

Note that also a "deterministic" version of the above algorithm acts randomly, but only in 
a somewhat "oblivious" manner. The sets Ai will still be drawn at random, but the decisions 
that the algorithm is allowed to make about them (through the kA numbers and the Ki sets) 
as well as the final decision whether to accept or reject will all be deterministic. This is since a 
deterministic version fixes the algorithm's internal coins and only them. 

Also for adaptive algorithms we need to analyze only the respective core algorithms. 

Observation 6.1.7. An adaptive testing algorithm for a label-invariant property can be con- 
verted to a corresponding core algorithm with the same sample complexity. 

Proof. Again we use a uniformly random permutation a of {1, . . . , n}. Regardless of how the 
original set Ai was chosen, now it will be chosen uniformly at random among all sets satisfying 
the same intersection sizes with the atoms of the partition generated by Ai,..., Ai-\ and the 
same membership relations with j±, . . . ,ji-i- Hence the use of a uniformly drawn set based on 
the kA numbers and Ki is justified, and since a is not revealed to the algorithm, the particular 
resulting set Ai is not revealed. 

Also, the probability for a particular value of ji now can depend only on the resulting i- 
configuration, and hence it is sufficient to reveal only the configuration to the algorithm - the 
algorithm can then use internal coin tosses to simulate the actual value of ji (uniformly drawing 
it from all values satisfying the same configuration) . The same goes for the decision whether to 
accept or reject in the end. 

To further illustrate the last point, note that the analysis does not change even if we assume 
that at every phase, after choosing Ai we also draw a new random permutation, chosen uniformly 
at random among all those that preserve ji, . . . ,ji-i and the atoms of A\,. . . ,Ai (but can 
"reshuffle" each atom internally). Then the "position inside its atom" of ji will be completely 
uniform among those of the same configuration (if the configuration makes it equal to a previous 
jk then there is only one choice for ji anyway) . □ 

6.2 Uniformity has no constant sample non-adaptive test 

Theorem 6.2.1. Testing uniformity requires at least f2(loglogn) non-adaptive conditional sam- 
ples (for some fixed e). 

To prove this lower bound, we show that for any fixed t and large enough n, no deterministic 
non-adaptive algorithm can distinguish with probability | between the case where the input 
distribution is the uniform one (with probability 1), and the case where the input distribution 
is drawn according to the even uniblock distribution over distributions. Recall that such a 
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deterministic algorithm is in fact given by fixed sets A±, . . . , At Q {1, . . . ,re} and a fixed ac- 
ceptance criteria based on the ^configuration of the obtained samples (to see this, take a core 
non-adaptive testing algorithm and arbitrarily fix its internal coins). 

We now analyze the performance of a deterministic non-adaptive tester against the even 
uniblock distribution. Asymptotic expressions are for a fixed t and an increasing n. 

Definition 6.2.2. We call a set A C {l,...,n} large if \A\ > n2^°^/\U\, where U is the 
set chosen in the construction of the even uniblock distribution. We call A small if \A\ < 

Lemma 6.2.3. With probability at least 1— -^j=== over the choice ofU, all atoms in the partition 
generated by A\, . . . , At are either large or small. 

Proof. There is a fixed number of at most 2* atoms. An atom A is neither large nor small if 
n2 -VW^ < \A\\U\ < n2v' I °g". \U\ = 2 2k where |logn < k < flogn uniformly. Therefore, for 
a fixed A, there are at most yTogn values of k which will make it neither large nor small. Since 
the range of k is of size ^logra, we get that with probability at most ^j^- A is neither large 
nor small. Taking the union bound over all atoms gives the statement of the lemma. □ 

Lemma 6.2.4. With probability at least 1 — 2 t_v/logn ; no small atom intersects U. 

Proof. Given a fixed k, for any small set A the probability of it intersecting U is clearly bounded 
by 2-v /I °§™ We can now conclude the proof by union-bounding over all small atoms, whose 
number is bounded by 2*. □ 

Lemma 6.2.5. With probability 1 — exp(i — t 2 ), for every large atom A, we have \A D U\ = 
(l±^l) \A\-\U\/n. 

Proof. This is by a large deviations inequality followed by a union bound over all atoms. Note 
first that if instead of U we had a uniformly random sequence u±, . . . , u 2 2k (chosen with possible 
repetitions), then this would have been covered by Lemma 12.2.21 However, U is a random set 
of fixed size instead. For this we appeal to Section 6 of [9 J , where it is proved that moving from 
a Binomial to a Hyper geometric distribution (which corresponds to choosing the set U with the 
fixed size) only makes the distribution more concentrated. The rest follows by the fact that A 
is large enough. □ 

Now we can take t < -r log log n and put forth the following lemma, which implies that 
the uniblock distribution over distributions is indeed indistinguishable by a deterministic non- 
adaptive core algorithm from the uniform distribution using only t samples. 

Lemma 6.2.6. For t < | log log n, with probability 1 — o(l) ; the distribution over {1, ... ,n} 
obtained from the uniblock distribution over distributions, is such that the resulting distribution 
over the configurations of ji,...,jt is o{l)-close in the variation distance to the distribution 
over configurations resulting from the uniform distribution over {1, . . . ,n}. 

Proof. With probability 1 — o(l) all of the events in Lemmas 16.2.31 16.2.41 and 16.2.51 occur. We 
prove that in this case the two distributions over configurations are o(l)-close. Recall that 
the uniform distribution over the set U (resulting from the uniblock distribution) is called the 
U -distribution. The lemma follows from the following: 

• A sample taken from a set Ai that contains only small atoms will be uniform from this set 
(and independent of all others), both for the uniform distribution and the [/-distribution. 
For the [/-distribution it follows from U not intersecting Ai at all (recall that in our model, 
a conditional sample with a set of empty weight returns a uniformly random element from 
that set). 
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• A sample taken from a set Ai that contains some large atom will not be identical to 
any other sample with probability 1 — o(l) for both distributions. This follows from the 
birthday paradox: Setting A to be the large atom contained in Ai, recall that \A H U\ = 

^1 ± ^j=f "/ 4 4 ^ \A\ -\U\/n. This quantity is w(log 2 log n). Thus for a fixed i the probability 

for a collision with any other j is o(l/ log log n) (regardless of whether Aj contains a large 
atom), and hence with probability 1 — o(l) there will be no collision for any i for which 
A{ contains a large atom. 

• For a set Ai containing a large atom, the distribution over the algebra of the events 
ji G Ak (which corresponds to the distribution over the atom in the partition generated 
by Ai, . . . , At containing ji) are o(l) close for both distributions. To show this we analyze 
every atom A generated by A\, . . . ,At that is contained in Ai separately. If A is small, then 
for the uniform distribution, ji will not be in it with probability 1 — o(l) (a small atom is in 
particular of size o(|-Aj|) since Ai contains a large atom as well), while for the [/-distribution 
this is with probability 1 (recall that we conditioned on the event of U not intersecting any 

small atom). If A is large, then we have \A PI U\ = i ~"fv^§fS/4~) 1^1 ' W\/ n , implying 
that the probabilities for ji S A for the [/-distribution and the uniform one are only o(l) 
apart. 

The items above allow us to conclude the proof. They mean that for both the \U\ -distribution 
(conditioned on the events in Lemmas 16.2.31 16.2.41 and I6.2.5P and the uniform distribution, 
the resulting distributions over configurations are o(l)-close to the one resulting by setting the 
following: 

1. For every i for which Ai is small, uniformly pick ji £ Ai independently of all other random 
choices; write down the equalities between these samples and the atoms to which these 
samples belong. 

2. For every i for which Ai is large, write ji as having no collisions with any other sample; 
then pick the atom containing ji from all atoms contained in Ai according to their relative 
sizes, in a manner independent of all other random choices. 

□ 

Lemma 16.2.61 allows us to conclude the argument by Yao's method. 

Lemma 6.2.7. All non-adaptive algorithms taking t < \ log log n conditional samples will fail 
to distinguish the uniform distribution from the even uniblock distribution over distributions 
(which are all \-far from uniform) with any probability more than o(l). 

Proof. By Observation 16 . 1 .51 it is enough to consider core non-adaptive algorithms, and by Yao's 
argument it is enough to consider deterministic ones. 

For any deterministic non-adaptive core algorithm (characterized by A\ , . . . , At and a func- 
tion assigning a decision to every possible configuration), the even uniblock distribution with 
probability 1 — o(l) will choose a [/-distribution, which in turn will induce a distribution over 
configurations that is o(l)-close to that induced by the uniform distribution over {l,...,n}. 
This means that if we look at the distribution over configurations caused by the even uniblock 
distribution over distributions itself, it will also be o(l)-close to the one induced by the uniform 
distribution. Therefore the acceptance probabilities of the algorithm for both distributions over 
distributions are o(l)-close. □ 

It would be interesting to make the bound on the number of samples into a power of logn, 
possibly by trying to analyze the sets Ai in themselves rather than through their generated 
partition. 
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6.3 A label-invariant property with no constant sample adaptive test 

Theorem 6.3.1. There exists a label invariant property such that any adaptive testing algorithm 
for it must use at least 0(y / loglog n) conditional samples (for some e). 

The property will be that of the distribution being the possible result of the even uniblock 
distribution over distributions. In other words, it is the property of being equal to the U- 
distribution over some set U of size 2 2k for some g logra < k < | logn. 

We show that no "deterministic" adaptive core algorithm can distinguish between the even 
and odd uniblock distributions using o(\/\og log n) samples, while by Observation 16. 1 .21 a proper 
i-test must distinguish between these. Considering such algorithms, we first note that they can 
be represented by decision trees, where each node of height i corresponds to an i— 1-configuration 
of the samples made so far. An internal node describes a new sample, through the numbers kA 
provided for every atom A of A\, ... ,Ai (where the atoms are labeled by their operations, as 
the Ai themselves are not revealed to the algorithm), and the set Ki (all these parameters could 
be different for different nodes of height i). A leaf is labeled with an accept or reject decision. 

The basic ideas of the analysis are similar to those of the previous subsection, but the 
analysis itself is more complex because we have to consider the "partition generated by the 
samples so far" in every step of the algorithm. 

First thing to note is that there are not too many nodes in the decision tree. 

Observation 6.3.2. The number of nodes in a decision tree corresponding to a t-sample algo- 
rithm is less than t2 2t . 

Proof. A configuration can be described by assigning each of the i samples with a vector of 
length 2i indicating which sets do they belong to and which of the other samples are they 
equal to. This gives an i x 2i binary matrix, where every possible i-configuration for i samples 
corresponds to some such matrix. That gives us at most 2 2% possible ^-configurations. Summing 
for alii < t gives the bound in the statement. □ 

From now on we will always assume that n is larger than an appropriate fixed constant. 
For the analysis, we consider two input distributions as being drawn at once, one according to 
the even uniblock distribution and the other according to the odd uniblock distribution. We 
first choose | log n < k < | log n uniformly at random, and then uniformly choose a set U of 
size 2 2k and a set U' of size 2 2k+1 . We then set /i to be the [/-distribution and // to be the 
[/'-distribution. 

We will now show that the fixed decision tree accepts with almost the same probability when 
given either /i or //, which will allow us to conclude the proof using Yao's argument. We start 
with a notion of "large" and "small" similar to the one used for non-adaptive algorithms, only 
here we need it for the numbers themselves. 

Definition 6.3.3. We call a number b large with respect to U if b > n2 v/logn /|[/|. We call b 
small with respect to U if b < n2^ y ' logn /\U\. We make the analogous definitions with respect 
to U'. 

Lemma 6.3.4. With probability at least 1— , oil "kA" numbers appearing in the decision 

tree are either small with respect to both U and U' , or large with respect to both U and U' . 

Proof. By Observation 16.3.21 the total of different "A;^" numbers is no more than t2 3t (the 
number of nodes times 2* - the bound on the size of the partition generated by A±, . . . ,Ai in 
every node). We can conclude similarly to the proof of Lemma 16.2.31 that since \U\ and \U'\ 
differ by a factor of 2, there are at most ^f\ogn values of k for which some fixed number kA 
will not be either large with respect to both or small with respect to both. The bound in the 
statement then follows by union bound. □ 



26 



From now on we assume that the event of Lemma [6.3.41 has occurred, and fix k (that is, the 
following will hold not only for the entire distributions, but also for the conditioning on every 
specific k for which the event of Lemma 16.3.41 is satisfied). The following lemma is analogous 
to the non-adaptive counterparts Lemma 16.2.41 and Lemma 16.2.51 but here it is proved by 
induction for every node that is reached while running the decision tree over the distribution 
drawn according to either or //, where the inductive argument requires both statements to 
hold. This lemma will essentially be used as a warm-up, since the final proof will refer to the 
proof and not just the statement of the lemma. 



Lemma 6.3.5. Assuming t < ^ log log n and conditioned on that the events of Lemma 6. 3.4\ 
have occurred, for every 1 <i <t, with probability at least 1 — -^j== , the following occur. 

• All small atoms in the partition generated by A\, . . . , A{ contain no members of either U 
or U' outside (possibly) . . . , ji-i}. 

• For every large atom B in the partition generated by A\, . . . , Ai, we have both \B D U\ = 

i ± ivfe) \ B \ -\ u \/ n and \ B n u '\ = { 1 ± ^mm) \ B \ ■ PVn. 

Proof. We shall prove the lemma not only conditioned on the event of Lemma 16.3.41 but also 
conditioned on any fixed \U\ (and \U'\ = 2\U\) for which Lemma 16.3.41 is satisfied. We assume 
by induction that this occurs for the atoms in the partition generated by A±, . . . , Ai— i with 
probability at least 1 — ^ j and prove it for A\ , . . . , Ai with probability at least 1 — = . 
Recall that the way Ai is generated, the algorithm in fact specifies how many members of it 
will appear in A \ . . . ,ji-i} for every atom A of the partition generated by A\, . . . ,Ai—\ 
(while specifying exactly which of j\, . . . ,ji-x will appear in it), and then the actual set is drawn 
uniformly at random from those that satisfy it. 

We show the conclusion of the lemma to hold even if U and U' are held fixed (as long as 
they satisfy the induction hypothesis and their sizes satisfy the assertion of Lemma l6.3.4p . Let 
B be an atom of A\, . . . ,Ai and let A be the atom of Ai, . . . , ^4j_i so that B C A. We have 
several cases to consider, conditioned on the fact that the event in the statement does occur for 
i - 1. 



If A is small, then so is B. By the induction hypothesis A\{ji, . . . , jj-i} has no members 
of U or U' , and hence so does B. This happens with (conditional) probability 1. 

If A is large but B is small, by the induction hypothesis both \ Af] U\ = ( 1 ± J|~^ 4 J \ A\ ■ 
\U\/n and \AnU'\ = ( 1 ± -fe^j) \A\ ■ \U'\/n. When this happens, as B \ {ji, . . . 



2>/log n/4 

is in fact chosen uniformly from all subsets of A\{ji, . . . ,ji-i} of the same size (either kA 
or \A \ . . . — kA), and since B is small, we can use a union bound to see that 

no member of either U or U' is taken into B with probability 1 — 2 1 ~ v ' logn . 

If B is large (and hence so is A), then again by the induction hypothesis both \A Pi U\ = 
1 ± -gyj |^| • \U\/n and \A n U'\ = ( 1 ± ^Mjl) \A\ ■ \U'\/n. We also note that 



2\/log n 1 4 J I I I 1/ I I \ 2%/^°s n/4 

1/2 

since B is large we have in particular t < 2%/ | ogn / 4 \ B\. We can now use a large deviation 
inequality (as in Lemma I6.2.5P to conclude the bounds for \B n U\ and \B n U'\ with 
probability 1 - 2exp(-2v /I °s"/ 2 - 2 ). 

Thus in all cases the statement will not hold with probability at most ^ for n large enough. 
By taking the union bound over all possibilities for B (up to 2* events in total) we get that 
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with probability 1 — the statement of the lemma holds for A\, . . . , A4, conditioned on the 

event occurring for Ax, . . . ,A^i. A union bound with the event of the induction hypothesis 
happening for Ax,... , A{_x gives the required bound. □ 

We now prove the lemma showing the indistinguishability of fi from /jf whenever t < 



log log n, conditioned on the event of Lemma 16.3.41 We assume without loss of generality 
that the decision tree of the algorithm is full and balanced, which means that the algorithm 
will always take t samples even if its output was already determined before they were taken. 



Lemma 6.3.6. Assuming that t < y 7^ log log n and that the event of Lemma \ 6.3.4\ has oc- 
curred, consider the resulting distributions of which of the leaves of the algorithm was reached. 
These two distributions, under [i compared to under y! , are at most apart from each other. 

Proof. The proof is reminiscent of the proof of Lemma 16.2.61 above, but requires more cases to 
be considered, as well as induction over the height of the node. Denoting this height by i, we 
shall prove by induction that the distributions over which of the height i nodes was reached, 
under \i compared to //, are only are at most 1 — ^== apart from each other. 

We shall use the induction hypothesis that the corresponding distributions over the node of 
height i—1 (the parent of the node that we consider now) are at most 1 — ^ n apart, and then 
show that the variation distance between the distributions determining the transition from a 
particular parent to the child node is no more than yj== ; which when added to the difference 
in the distributions over the parent nodes gives required bound. 

The full induction hypothesis will include not only the bound on the distributions of the 
parent nodes, but also a host of other assumptions, that we prove along to occur with probability 
at least 1 — -^==. 111 particular, instead of using the statement of Lemma l6.3.5( we essentially 
re-prove it here. So the induction hypothesis also includes that all of the events proved during 
the inductive proof of Lemma 16.3.51 hold here with respect to A±, . . . , A%-x- Also, as in the 
proof of Lemma 16.3.51 the conditional probability of them not holding for Ax, ■ ■ ■ , Ai is at most 
n (by the union bound done there for every atom generated by Ax,... ,Ai of the event 
of the hypothesis failing for any single atom A). Therefore, we assume that additionally the 
inductive hypothesis used in the proof of Lemma 16.3.51 has occurred for Ax , . . . , Ai , and prove 
that with probability at least 1 — ^ n au other assertions of the inductive hypothesis occur as 
well as that the variation distance between the distributions over the choice of the child node 
is at most ? By a union bound argument (and for the variation distance, a "common 

Vlogn o \ * 

large probability event" argument), this will give us the 1 — bound that we need for the 

induction. Recall that the choice of child node depends deterministically on the question of 
which atom of Ax , ■ ■ ■ , Ai contains the obtained sample ji , so in fact we will bound the distance 
between the distributions of the atom in which ji has landed. 

Additionally, we define by induction over i the following notion: An index i is called smallish 
if all the "fc^" numbers relating to it are small, and additionally Ki contains only smallish 
indexes (recall that Ki C {1, ...,% — 1}). A final addition to our induction hypothesis is that 
with probability at least 1 — ^ = , in addition to all our other assertions, the following occur 
for every i' < i. 

• The sample is in U or respectively U' if and only if i! is not smallish (note that the 
assignment of smallish indexes depends on the parent node). 

• If i! is not smallish but all its corresponding "/c^" numbers are small, then jV is equal to 
some ji where I is a non-smallish index smaller than i' . 
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• If there exists a large number for i! , then j^i is not equal to ji for any I < i', and 
additionally jv ues m some atom A' for which the corresponding k^i is not small (it is 
allowed that A' = A). 

We now work for every possible parent node of height i — 1 separately. Note that we 
restrict our attention to nodes whose corresponding (i — ^-configurations satisfy the induction 
hypothesis. Recall that we assume that the induction hypothesis in the proof of Lemma 16.3.51 
has occurred for Ai, . . . ,A^, and aim for a ^ n "failure probability" bound. We separate to 
cases according to the nature of A±, . . . , Ai. 

• A sample taken from a set Ai, where i is smallish, will be uniform and independent of 
other samples, for both the ^/-distribution and the t/'-distribution. Moreover, this ji in 
itself will not be a member of U or respectively U' . This is since Ai \ {j^ : k E K{\ does 
not intersect U or U' , while using the induction hypothesis for {j^ : k E K{\ (so also Ai 
does not intersect U or U'). So conditioned on the entire induction hypothesis for i — 1 
and the hypothesis in the proof of Lemma 16.3.51 for A±, . . . , Ai, all assertions for i will 
occur with probability 1, and the distributions for selecting the height i node given this 
particular parent node are identical under either /x or //. 

• A sample taken from a set Ai, where the k& numbers are all small but i is not smallish, 
will be a member of U or respectively U' , chosen uniformly (and independently) from 
{jk '■ k E K-}, where K- denotes the (non-empty) set of all non-smallish indexes in Ki. 
This is since {jf. : k E K^} is exactly the set of members of U or respectively of U' in Ai 
(by the hypothesis for A\, . . . , A4 there will be no member of U or U' in Ai \ {jk ■ k E Ki}, 
and the rest follows from the induction hypothesis concerning smallish indexes). Again 
the assertions for i follow with probability 1 (conditioned on the above hypotheses), and 
the distributions for selecting the height i node are identical. 

• If a sample is taken from Ai where at least one of the kji numbers is not small, then the 
following occur. 



— Since A4 in particular contains the atom A, and both \A Pi U\ = ( 1 ± 2 vio en /4. j l^-l ' 
\U\/n and \A D U'\ = (l ± 2V J gn/4 ) \A\ ■ \U'\/n by the assertion over A 1: ... ,Ai 
relating to Lemma [6.3.51 we note that in particular i = °( ^ logn |A H U\) and i = 
o(—f==\AiC\U'\), so with probability less than (for n larger than some constant) 
we will get under either /i or // a sample that is identical to a prior one. 

— By the assertion over A\, . . . , Ai, an atom B inside Ai for which the corresponding 
ks is small will not contain a member of U or U' , and so ji will not be in such an 
atom (in the preceding item we have already established that there are members of 
U and respectively V in Ai). 

— By the assertion over A%,...,Ai, for every large atom B inside Ai we have both 
\BHU\ = (l ± ^fkm) \ B V\ U \l n and \ B ^ U '\ = (l± iTOl) \B\-\U'\/n, implying 
that ^TTjp- = ( 1 db /J ' /5 ) ^ B \if/\ ^ (for large enough n). Also, every small atom C 



\U\ ~ \ 2^ I5 i^/ 5 J \U'\ 

inside A4 contains no members of U or U', so summing over all atoms of Ai we obtain 
^pjp = fl ± 2% /iogn/5 ) ' » ano - ^ nus ^ or ever y atom i? of (large or small) we 

Igngj _ /, , j \ \Bnu'\ 



finally have pTng/j = f 1 =b 2 ^ lo g n/6 J \A nU'\ ^ or smau atoms both sides are zero). 
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Te final thing to note is that |jf an d respectively ra^m equal the probabilities of 
obtaining a sample from 1? under /i and respectively //. Summing over all atoms con- 
tained in Ai (of which there are 2 l_1 ) we obtain a difference over these distributions 
that is bounded by ^= = , which satisfies the requirements (also after conditioning 
on that the events related to the rest of the induction hypothesis have occurred). 

Having covered all cases, this completes the proof that the inductive hypothesis follows to i, 
and thus the proof of the lemma. □ 

Now we can conclude the argument by Yao's method to prove the following lemma that 
implies the theorem. 



Lemma 6.3.7. All adaptive algorithms taking t < y ^ log log n conditional samples will fail to 
distinguish the even uniblock distribution over distributions from the odd one (whose outcomes 
are always ^ -far from those of the even distribution) with any probability more than o(l). 

Proof. By Observation 16.1.71 it is enough to consider only core adaptive algorithms, and then 
by Yao's argument it is enough to consider "deterministic" ones (the quote marks are because 
the external coin tosses are retained as per the definitions above). We now consider the decision 
tree of such an algorithm, and feed to it either \x or // that are drawn as per the definition 

above. With probability at least 1 — = 1 — o(l) the event of Lemma 16.3.41 has occurred, 

and conditioned on this event (or even if we condition on particular U and U'), Lemma 16.3.61 
provides that the variation distance between the resulting distributions over the leafs is at most 
^== = o(l). In particular this bounds the difference between the (conditional) probabilities 
of the event of reaching an accepting leaf of the algorithm. 

Since we have an o(l) difference when conditioned on a 1 — o(l) probability event, we 
also have an o(l) difference on the unconditioned probability of reaching an accepting leaf 
under \i compared to //. This means that the algorithm cannot distinguish between the two 
corresponding distributions over distributions. □ 



7 A lower bound for testing general properties of distributions 

For properties that are not required to be label-invariant, near-maximal non-testability could 
happen also when conditional samples are allowed. 

Theorem 7.0.1. Some properties of distributions on [n] require O(n) conditional samples to 
test (adaptive or not). 

We assume n is even. To prove Theorem l7.0.1l we reduce the problem of testing general n/2- 
bit binary string properties P C {0, l} n / 2 to the problem of testing properties of distributions 
over [n] with conditional samples. The reduction is probabilistic, succeeding with probability 
1 — o(l), and only incurs an additional O(l) factor in the query complexity, that is, each 
conditional sample made by the distribution tester is translated into (expected) 0(1) queries 
to the input binary string x 6 {0, l} n / 2 . Then the lower bound follows by the existence of 
hard-to-test properties P C {0, l} n / 2 that require Q(n) queries to test (see e.g. |6J). 



7.1 The Reduction 

We start with a few definitions. A string y £ {0, l} n is balanced if it has the same number of 
0s and Is (in particular we assume here that n is even). For x G {0, l} n / 2 , let b(x) € {0, 1}™ be 
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the string obtained by concatenating x with its bitwise complement (in which each original bit 
of x is flipped). Clearly b{x) is balanced for all x. 

For a property P C {0, l} n / 2 , define b(P) C {0, l} n as 6(P) = {6(x) : x G P}. 

Observation 7.1.1. For all x,y G {0, l} n / 2 , d(x,y) = d(b(x) , b(y)) . 

Proof. Follows from the fact that if x and y differ in d(x, y) ■ ^ entries, then b(x) and b{y) differ 
in d(x, y) ■ n entries. □ 

Observation 7.1.2. For all P and e > 0, e-testing b(P) requires at least as many queries as 
e-testing P. 

Proof. This is since we can simulate the tester for b(P) also for a non balanced string x G 
{0, l} n / 2 , where a query for an index i < n/2 would return Xi and for i > n/2 the query would 
return 1 — Xj_ n / 2 . □ 

Next, for every balanced string x G {0, l} n we define a distribution fj, x on [n] as follows: 

• If Xi = then fi x (i) = 

• if Xi = 1 then n x (i) = ^. 

Note that since x is balanced fj, x is indeed a distribution as J27=i Ma;(*) = 1- 

Extending this definition further, for every property P C {0, l} n / 2 we define a property Vp 
of distributions over [n] as follows: 

Observation 7.1.3. For all x,y G {0, l} n / 2 , d(b(x),b(y)) = 2 • d(fj,u x \, fJ>b(y))> where the first 
distance refers to the normalized Hamming distance between binary strings, and the second is 
the variation distance between distributions. 

Proof. This follows by direct calculation. □ 

Theorem I7.U.1I follows by the following extension of Observation I7.1.2t 

Lemma 7.1.4. For all P and e > 0, if e-testing P with success probability 3/5 requires at least 
q queries, then e/2-testing Vp with success probability 2/3 requires at least q/100 conditional 
samples. 

Proof. By Observation 17X31 for all x G {0, l} n / 2 , if x G P then ^i b(x) G Vp, and if d(x, P) > e 
then d{nu x \,Vp) > e/2. Now we show how to reduce the task of testing P to testing Vp. Let 
T be a tester for Vp making at most g/100 conditional samples. Given an oracle access to 
the input string x G {0, l} n / 2 , which is to be tested for membership in P, we simulate each 
conditional sample ^ Q C [n] to (ib(x) made by T as follows: 

Sampler 

1. Pick i G Q uniformly at random. If i < n/2 query Xi and set Vi <— x^. Else, query Xj_ n / 2 
and set vi <— 1 — Xj_ n / 2 - 

2. If Vi = 1, output i. 

3. Else, with probability 1/3 output i, and with the remaining probability go to StepHJ 
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It is clear that whenever Sampler outputs i with Vi = 1, then i is distributed uniformly among 
all indices {j £ Q ■ Vj = 1}. Same is true for i such that V{ = 0. So, to convince ourselves that 
Sampler simulates conditional samples correctly, we only need to prove that the ratio between 
the probability of outputting i with v j = 1 and the probability of outputting i with vi = is 
correct. 

Let qi = \{i 6 Q : = 1}| and qo — \{i £ Q '■ Vi = 0}|. According to our distribution fJ,b{x), 
the distribution of indices in Q corresponding to the conditional sample is as follows: 

' = Safe * «* = !■ 

* Pr M = 3^ if *>i = 0- 

In particular, the probability of selecting i such that = 1 is 3qi/qo times the probability of 
selecting i with t>j = 0. 

Let us now analyze what is the probability with which Sampler outputs (eventually) an index 
i £ Q with Vi = 1, and with Vi = 0, respectively. At every round, an index i with Vi = 1 is output 
with probability ^+^> and an index z with = is output with probability ^ q ^ qo ^ ■ With the 

remaining probability (of 3^+^) ) 110 index is output, and the process repeats independently 
of all previous rounds. Hence the ratio of the probability of outputting i such that v\ = 1 to 
the probability of outputting i with = is 3qi/qo, as required. Note also that the expected 
number of rounds (and so queries to x) per one execution of Sampler is (1 — 3( 9l + go ) 

The last ingredient in the reduction is a total-query counter, that makes sure that the 
number of queries to x does not exceed q (the lower bound). If so, the reduction fails. Since 
Sampler is called at most q/WO times (the query complexity of T), a 3/100 < 1/15 bound 
on the failure probability follows by Markov's inequality, and we are done (the bound on the 
success probability follows even if we assume that the distribution tester "magically" guesses 
the correct answer whenever the reduction to the string property fails). □ 



References 

[1] Noga Alon and Joel H. Spencer. The probabilistic method. Wiley-Interscience Series in 
Discrete Mathematics and Optimization. John Wiley & Sons Inc., Hoboken, NJ, third 
edition, 2008. 

[2] Tugkan Batu, Sanjoy Dasgupta, Ravi Kumar, and Ronitt Rubinfeld. The complexity of 
approximating the entropy. SIAM J. Comput., 35(1):132-150, 2005. 

[3] Tugkan Batu, Lance Fortnow, Eldar Fischer, Ravi Kumar, Ronitt Rubinfeld, and Patrick 
White. Testing random variables for independence and identity. In Bob Werner, editor, 
Proceedings of the 42nd Annual Symposium on Foundations of Computer Science (FOCS- 
01), pages 442-451, Los Alamitos, CA, October 14-17 2001. 

[4] Tugkan Batu, Lance Fortnow, Ronitt Rubinfeld, Warren D. Smith, and Patrick White. 
Testing closeness of discrete distributions. CoRR, abs/1009.5397, 2010. Extended abstract 
appeared in the proceedings of the 41st Annual Symposium on Foundations of Computer 
Science (FOCS-00), pages 259-269. 

[5] Tugkan Batu, Ravi Kumar, and Ronitt Rubinfeld. Sublinear algorithms for testing mono- 
tone and unimodal distributions. In Proceedings of the 36th Annual ACM Symposium on 
Theory of Computing, pages 381-390, New York, 2004. 



32 



[6] Oded Goldreich, Shafi Goldwasser, and Dana Ron. Property testing and its connection to 
learning and approximation. J. ACM, 45(4):653-750, 1998. 

[7] Oded Goldreich and Dana Ron. On testing expansion in bounded-degree graphs. In Oded 
Goldreich, editor, Studies in Complexity and Cryptography, volume 6650 of Lecture Notes 
in Computer Science, pages 68-75. Springer, 2011. 

[8] Sudipto Guha, Andrew McGregor, and Suresh Venkatasubramanian. Sublinear estimation 
of entropy and information distances. ACM Transactions on Algorithms, 5(4), 2009. 

[9] Wassily Hoeffding. Probability inequalities for sums of bounded random variables. J. Amer. 
Statist. Assoc., 58:13-30, 1963. 

[10] Reut Levi, Dana Ron, and Ronitt Rubinfeld. Testing properties of collections of distribu- 
tions. In Bernard Chazelle, editor, Proceedings of the 1st Symposium on Innovations in 
Computer Science (ICS-10), pages 179-194, Beijing, China, January 5-7 2010. 

[11] Sofya Raskhodnikova, Dana Ron, Amir Shpilka, and Adam Smith. Strong lower bounds 
for approximating distribution support size and the distinct elements problem. SIAM J. 
Comput., 39(3):813-842, 2009. 

[12] Paul Valiant. Testing symmetric properties of distributions. SIAM J. Comput., 40(6):1927- 
1968, 2011. 



33 



